Aggressive Morphology for Robust Lexical Coverage

نویسنده

  • William A. Woods
چکیده

This paper describes an approach to providing lexical information for natural language processing in unrestricted domains. A system of approximately 1200 morphological rules is used to extend a core lexicon of 39,000 words to provide lexical coverage that exceeds that of a lexicon of 80,000 words or 150,000 word forms. The morphological system is described, and lexical coverage is evaluated for random words chosen from a previously unanalyzed corpus. email address: [email protected] © 1999 Sun Microsystems, Inc. All rights reserved. The SML Technical Report Series is published by Sun Microsystems Laboratories, of Sun Microsystems, Inc. Printed in U.S.A. Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic, or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the copyright owner. TRADEMARKS Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief . Aggressive Morphology for Robust Lexical Coverage

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Relationship between Iranian EFL Learners' Reading Comprehension, Vocabulary Size and Lexical Coverage of the Text: The Case of Narrative and Argumentative Genres

This study explored the relationship between EFL learners’ vocabulary size, lexical coverage of the text and reading comprehension texts (narrative & argumentative genres). To this end, 120 male and female out of 180 students studying at Talesh Azad University were selected based on their performance on the Nelson Proficiency Test. A Nelson reading proficiency test was also administered in orde...

متن کامل

Constructing and Using Broad-coverage Lexical Resource for Enhancing Morphological Analysis of Arabic

Broad-coverage language resources which provide prior linguistic knowledge must improve the accuracy and the performance of NLP applications. We are constructing a broad-coverage lexical resource to improve the accuracy of morphological analyzers and part-of-speech taggers of Arabic text. Over the past 1200 years, many different kinds of Arabic language lexicons were constructed; these lexicons...

متن کامل

Paradigmatic Morphology and Subjectivity Mark-Up in the RoWordNet Lexical Ontology

Lexical ontologies are fundamental resources for any linguistic application with wide coverage. The reference lexical ontology is the ensemble made of Princeton WordNet, a huge semantic network, and SUMO&MILO ontology, the concepts of which are labelling each synonymic series of Princeton WordNet. This lexical ontology was developed for English language, but currently there are more than 50 sim...

متن کامل

Lexical Entry Templates for Robust Deep Parsing

We report on the development and employment of lexical entry templates in a large–coverage unification–based grammar of Spanish. The aim of the work reported in this paper is to provide robust deep linguistic processing in order to make the grammar more adequate for industrial NLP applications.

متن کامل

Accurate and Robust LFG-Based Generation for Chinese

We describe three PCFG-based models for Chinese sentence realisation from LexicalFunctional Grammar (LFG) f-structures. Both the lexicalised model and the history-based model improve on the accuracy of a simple wide-coverage PCFG model by adding lexical and contextual information to weaken inappropriate independence assumptions implicit in the PCFG models. In addition, we provide techniques for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000